Neural Voice Cloning with a Few Samples

نویسندگان

Sercan Ömer Arik

Jitong Chen

Kainan Peng

Wei Ping

Yanqi Zhou

چکیده

Voice cloning is a highly desired feature for personalized speech interfaces. Neural network based speech synthesis has been shown to generate high quality speech for a large number of speakers. In this paper, we introduce a neural voice cloning system that takes a few audio samples as input. We study two approaches: speaker adaptation and speaker encoding. Speaker adaptation is based on fine-tuning a multi-speaker generative model with a few cloning samples. Speaker encoding is based on training a separate model to directly infer a new speaker embedding from cloning audios and to be used with a multi-speaker generative model. In terms of naturalness of the speech and its similarity to original speaker, both approaches can achieve good performance, even with very few cloning audios. 1 While speaker adaptation can achieve better naturalness and similarity, the cloning time or required memory for the speaker encoding approach is significantly less, making it favorable for low-resource deployment.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Artificial Neural Networks and Support Vector Machines for Parkinson Disease Detection using Human Voice

Artificial neural network(ANN) with tansig, logsig and purelin transfer function, support vector machines(SVM), linear and quadratic classifiers are used in this work for the detection of Parkinson disease using voice features. In the Parkinson disease, voice of a person changes because of presence of tremor in the voicebox muscles. Total 195 phonations were used for the analysis from twenty th...

متن کامل

Text-Dependent Speaker Verification System Using Neural Network

This paper presents the use of back propagation neural network to implement voice recognition. The focus is to identify voice patterns of different people so as to recognize their voices electronically. The signals corresponding to a text phrase of a group of people are recorded in voice files on a computer using sound recording software. The information in these files is converted from time do...

متن کامل

On Using Backpropagation for Speech Texture Generation and Voice Conversion

Inspired by recent work on neural network image generation which rely on backpropagation towards the network inputs, we present a proof-of-concept system for speech texture synthesis and voice conversion based on two mechanisms: approximate inversion of the representation learned by a speech recognition neural network, and on matching statistics of neuron activations between different source an...

متن کامل

Tone Quality Improvement of Bone Conduction Voice by Cepstrum-based Local Conversion Models

A novel tone quality improvement method for a bone conduction voice is presented. In the present method, the tone quality of the bone conduction voice is converted to the similar quality of the air conduction voice. For the voice conversion, the present method uses a codebook, which consists of various paired code vectors of the bone and air conduction voices. The deltaand mel-cepstral coeffici...

متن کامل

بررسی جرم انگاری شبیه‌سازی انسان در حقوق ایران

Cloning, as a new technology, has attracted the attention of statesmen, physicians, lawyers and other scientific communities. This phenomenon both opens a new horizon on human society regarding its therapeutic features and brings some concerns to it. This technology is divided into two parts: Human or generative cloning and therapeutic or investigative cloning. The first meaning which comes to ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1802.06006 شماره

صفحات -

تاریخ انتشار 2018

Neural Voice Cloning with a Few Samples

نویسندگان

چکیده

منابع مشابه

Artificial Neural Networks and Support Vector Machines for Parkinson Disease Detection using Human Voice

Text-Dependent Speaker Verification System Using Neural Network

On Using Backpropagation for Speech Texture Generation and Voice Conversion

Tone Quality Improvement of Bone Conduction Voice by Cepstrum-based Local Conversion Models

بررسی جرم انگاری شبیه‌سازی انسان در حقوق ایران

عنوان ژورنال:

اشتراک گذاری